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Abstract —The goal of hyperspectral unmixing is to decompose 
an electromagnetic spectral dataset measured over M spectral 
bands and T pixels into TV constituent material spectra (or “end- 
members”) with corresponding spatial abundances. In this paper, 
we propose a novel approach to hyperspectral unmixing based 
on loopy belief propagation (BP) that enables the exploitation of 
spectral coherence in the endmembers and spatial coherence in 
the abundances. In particular, we partition the factor graph into 
spectral coherence, spatial coherence, and bilinear subgraphs, 
and pass messages between them using a “turbo” approach. 
To perform message passing within the bilinear subgraph, we 
employ the bilinear generalized approximate message passing 
algorithm (BiG-AMP), a recently proposed belief-propagation- 
based approach to matrix factorization. Furthermore, we propose 
an expectation-maximization (EM) strategy to tune the prior 
parameters and a model-order selection strategy to select the 
number of materials TV. Numerical experiments conducted with 
both synthetic and real-world data show favorable unmixing 
performance relative to existing methods. 

Index Terms —hyperspectral imaging, approximate message 
passing, belief propagation, expectation-maximization algorithms 


I. Introduction 

In hyperspectral unmixing (HU), the objective is to jointly 
estimate the spectral signatures and per-pixel abundances of 
the TV materials present in a scene, given measurements across 
M spectral bands at each of T = Tj x T% pixels. Often, a 
linear mixing model 0, 0 is assumed, in which case the 
measurements Y £ R MxT are modeled as 

Y = SA + W, (1) 

where the nth column of S € K j f x A ' represents the spectrum 
(or “endmember”) of the nth material, the nth row of A £ 
R^ xT represents the spatial abundance of the nth material, 
and W represents noise. Both S and A must contain only 
non-negative (NN) elements, and each column of A must obey 
the simplex constraint (i.e., NN and sum-to-one). Recently, 
nonlinear mixing models have also been considered (e.g., II, 
0), although such models lie outside of the scope of this 
paper. 
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Traditionally, hyperspectral unmixing is a two-step proce¬ 
dure, consisting of endmember extraction (EE) to recover the 
endmembers followed by inversion to recover the abundances. 
Many EE algorithms leverage the “pure pixel ” assumption: 
for each material, there exists at least one observed pixel 
containing only that material (i.e., all columns of the TV x TV 
identity matrix can be found among the columns of A). Well- 
known examples of pure-pixel-based EE algorithms include 
N-FINDR |6l and VCA 0. The existence of pure pixels in 
HU is equivalent to “ separability ” in the problem of non¬ 
negative matrix factorization (NMF), where the goal is to find 
S £ R+ /xW and A £ R^ xT matching a given Z = SA. 
There, separability has been shown to be sufficient for the 
existence of unique factorizations 0 and polynomial-time 
solvers 0, with a recent example being the FSNMF algorithm 
from m. In HU, however, the limited spatial-resolution of 
hyperspectral cameras implies that the pure-pixel assumption 
does not always hold in practice. With “mixed pixel” scenarios 
in mind, algorithms such as Minimum Volume Simplex Anal¬ 
ysis (MVSA) d and Minimum Volume Enclosing Simplex 
(MVES) ca attempt to find the minimum-volume simplex 
that contains the data Y. 

In the inversion step, the extracted endmembers in S are 
used to recover the simplex-constrained abundances in A. 
Often this is done by solving fl3l . lfT4ll 

A = argmin ||V — iSA|||, s.t. = lj., (2) 
A>0 

where ljy denotes the TV x 1 vector of ones, which is usually 
referred to as fully constrained least squares (FCLS). 

Real-world hyperspectral datasets can contain significant 
structure beyond non-negativity on s mn and simplex con¬ 
straints on {a n t}n—i- F° r example, the abundances {a n t}n=i 
will be sparse if most pixels contain significant contributions 
from only a small subset of the TV materials. Also, the 
abundances {a n t}f = i will be spatially coherent if the presence 
of a material in a given pixel makes it more likely for that 
same material to exist in neighboring pixels. Likewise, the 
endmembers {s mn }^f =1 will be spectrally coherent if the 
radiance values are correlated across frequency. 

Various unmixing algorithms have been proposed to lever¬ 
age these additional structures. For example, given an end- 
member estimate S, the SUnSAL algorithm ESI estimates 
sparse abundances A using t \-regularized least-squares (LS), 
and the SUnSAL-TV algorithm H6l adds total-variation (TV) 
regularization El to also penalize changes in abundance 
across neighboring pixels (i.e., to exploit spatial coherence). 
SUnSAL and SUnSAL-TV can be categorized as unmixing 
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algorithms, rather than inversion algorithms, since their t\- 
regularization supports the use of large (i.e., N > M ) and 
scene-independent endmember libraries for S. However, there 
are limitations on the size of the library S, and it can be 
difficult to determine suitable choices for the £\ and TV 
regularization weights. 

Traditional NMF techniques have also been enhanced to 
account for spectral and spatial coherence. For instance, 
the £ 1 / 2 NMF (L^NMF) lH8l algorithm promotes sparse 
abundances by adding ^1/2 regularization to the traditional 
NMF formulation. The algorithms in fl9l - l2ll then expand 
on this idea by adding additional regularizations to promote 
coherent abundances. For example, the Substance Dependence 
constrained NMF (SDSNMF) lfl9l , which was shown in l(l9l 
to perform the best out of ED-ED, employs a sparse pixel- 
by-pixel weighting matrix that accounts for similarities in 
the abundances across in the scene. Additional coherence- 
promoting NMF techniques include a method based on hier¬ 
archical rank-2 decompositions [22), a method that promotes 
both abundance separability and coherence ll23l . and a piece- 
wise spectral/spatial smoothness constrained method l24l . 

Bayesian approaches to hyperspectral unmixing have also 
been proposed. For example, the Bayesian Linear Unmix¬ 
ing (BLU) algorithm 1 251 1 employs priors that enforce NN 
constraints on the endmembers and simplex constraints on 
the per-pixel abundances, and returns either (approximately) 
minimum mean-square error (MMSE) or maximum a poste¬ 
riori (MAP) estimates using Gibbs sampling. The Spatially 
Constrained Unmixing (SCU) [26) algorithm, an extension of 
BLU, furthermore exploits spatial coherence using a hierar¬ 
chical Dirichlet-process prior. Both BLU and SCU have been 
shown to outperform N-FINDR and VCA-plus-FCLS under 
certain conditions |26) . but at the cost of several orders-of- 
magnitude increase in runtime. 

In this paper, we propose a novel empirical-Bayesian ap¬ 
proach to HU that is based on loopy belief propagation (LBP) 
lf27l . Our approach, referred to as HU turbo-AMP (HUT- 
AMP), simplifies the intractable task of LBP on the entire 
factor graph (see Fig. [U by partitioning it into three subgraphs: 
one that models spectral coherence (using N Gauss-Markov 
chains), one that models spatial coherence (using N binary 
Markov Random Fields (MRFs)), and one that models the NN 
bilinear structure of 0. While the first two subgraphs yield 
inference problems that are handled efficiently by standard 
methods |28) . lf29l , the third does not. Thus, to perform 
efficient inference on the latter subgraph, we apply the recently 
proposed Bilinear Generalized Approximate Message Passing 
(BiG-AMP) algorithm 1301 . BiG-AMP can be interpreted as 
an extension of approximate message passing (AMP) tech¬ 
niques GD-ED, originally proposed for the linear observation 
models that arise in compressive sensing, to bilinear models 
like 0. To merge BiG-AMP-based inference with Markov- 
chain and MRF-based inference, we leverage the “turbo AMP” 
approach first proposed in f34l and subsequently applied to 
joint channel-estimation and decoding |[35l , 1361 , compressive 
image retrieval ED, El, and compressive video retrieval 
(39), all with state-of-the-art results. In formulating our statis¬ 
tical model, we treat the parameters of the prior distributions as 


deterministic unknowns and estimate them from the data using 
the expectation-maximization (EM) algorithm, building on the 
NN sparse reconstruction work in [40). As such, our approach 
can be classified as empirical Bayesian ED. Lastly, when the 
number of materials N is unknown, we show how it can be 
accurately estimated using a classical model-order selection 
(MOS) strategy 142) . The resulting algorithm has the following 
desirable features: 1) it requires no tuning parameters, 2) it 
exploits both spectral and spatial coherence, and 3) it uses a 
computationally efficient inference procedure. 

We evaluate the performance of our proposed technique, 
in comparison to several recently proposed methods, through 
a detailed numerical study that includes both synthetic and 
real-world datasets. The results, presented in Sec. IIVI suggest 
that HUT-AMP yields an excellent combination of unmixing 
performance and computational complexity. 

Regarding novel contributions to HU models, we believe 
that our work (first presented in JD) is the first to use either 
of the following: i) Gauss-Markov chains to model spectral 
coherence in endmembers, ii) Bernoulli truncated-Gaussian 
mixtures to model abundance amplitudes. As for novel contri¬ 
butions to inference methodology, we believe that our work is 
the first to combine any of the following methods, and in fact 
we combine all four of them: i) compressed sensing with non¬ 
negative Bernoulli-Gaussian-mixture priors whose parameters 
are learned via EM EQ), ii) turbo compressed sensing that 
combines AMP with Markov-chain inference and learns the 
parameters via EM 139), iii) turbo compressed sensing that 
combines AMP with MRF inference and learns the parameters 
via EM EH, iv) bilinear AMP l30l . 

Notation: For matrices, we use boldface capital letters 
like A, and we use A J , tr(A), and ||.A||f to denote the 
transpose, trace, and Frobenius norm, respectively. For vectors, 
we use boldface small letters like x, and we use ||aj|| p = 
E„ |®n| p ) 1/,p to denote the i v norm, with x n = [x] n 
representing the n th element of x. We use 1 at to denote the 
N x 1 vector of ones. Deterministic quantities are denoted 
using serif typeface (e.g., x,x,X), while random quantities 
are denoted using san-serif typeface (e.g., X, X , X). For random 
variable X, we write the probability density function (pdf) as 
Px{x), the expectation as E{x}, and the variance as varjx}. 
For a Gaussian random variable X with mean m and variance 
v, we write the pdf as J\f(x ; m, v ) and, for the special case 
of A/”(at; 0,1), we abbreviate the pdf as ip(x) and write the 
complimentary cdf as <f> c (:r). Finally, we use S(x) (where 
x £ R) to denote the Dirac delta distribution and S n (where 
n £ Z) to denote the Kronecker delta sequence. 

II. Signal and Observation Models 
A. Background on BiG-AMP 

As described in the introduction, a distinguishing feature 
of our approach is the use of BiG-AMP ED for bilinear 
inference. We begin by overviewing BiG-AMP, since its 
operating assumptions affect the construction of our statistical 
model. 

Consider the problem of estimating the elements of the ma¬ 
trices S £ R MxAr and A £ R NxT from a noisy observation 
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Fig. 1. The factor graph for HUT-AMP for the toy-problem dimensions M = 3, TV = 2, and T = 4. Circles represent random variables and dark squares 
represent pdf factors. Each elongated bar in the left subgraph conglomerates the factors associated with an M-variable Markov chain (detailed in Fig. 0, 
while each square in the right subgraph conglomerates the factors associated with a T\ x T 2 -pixel Markov random field (detailed in Fig. 0. 


Y £ R MxT of the hidden bilinear form Z = SA € R MxT . 
(Our use of overbar notation will become clear in the sequel.) 
Suppose that the elements of both S and A can be modeled 
as independent random variables S rnn and a„ t with known 
prior pdfs pg (•) and pg (•), respectively, with S mn being 
zero-mean. Suppose also that the likelihood function of Z is 
known and separable, i.e., of the form 

M T 

Py\z( Y \ Z ) = n W_Py m t\ZmSymt\ Z mt)- (3) 

m= 1 1 =1 


Finally, suppose that the dimensions M, N , T are sufficiently 
large. In this case, approximations of the marginal posterior 
pdfs of S mn , &nt, and z mt can tractably be computed using 
loopy belief propagation (LBP) |27) , and in particular using 
an approximation of the sum-product algorithm (SPA) l43l 
known as BiG-AMP l30l . More precisely, BiG-AMP approx¬ 
imates the marginal posterior pdf of S mn as 


P s_ 




>\Y) = 


Ps m „( s mn)A/ r ( S mn j gmn, ^mn) 

I Ps rrl J, S mn)-N'( s mn'i Qmn, l/ mn)ds' r 


(4) 

where the parameters q mn and v c j nn are iteratively updated at 
each BiG-AMP iteration; similar approximations are made for 
the marginal posteriors of a nt and Z mt . BiG-AMP also com¬ 
putes the means and variances of these approximate marginal 
posteriors at each iteration, yielding approximate MMSE es¬ 
timates of S mn , a nt , and Z mt , as well as approximations of 
their corresponding MSEs. For many priors of interest (e.g, 
the ones used in this paper), these means and variances can 
be computed in closed form. 

In the big picture, BiG-AMP can be understood as a recent 
generalization of the AMP methods GU-GD from linear to 
bilinear inference. These AMP methods can be derived by 
starting with the SPA and applying i) central-limit-theorem 
arguments that approximate all messages as Gaussian and 
ii) Taylor-series approximations that reduce the number of 


messages. Under additional independence and sub-Gaussianity 
assumptions, these AMP methods can be analyzed in the large- 
system limit, where their behavior is fully characterized by a 
state evolution 104). When the state evolution has a unique 
fixed point, the posterior approximations computed by AMP 
are in fact Bayes-optimal in the large-system limit 04]]. For 
finite-sized problems, the fixed points of AMP are known 
to coincide with the stationary points of a particular Bethe 
free energy approximation El, 06|. For a more detailed 
description of how AMP methods fit into the larger family 
of variational Bayesian methods, we refer the reader to the 
recent tutorial 07). For a detailed derivation of BiG-AMP, we 
refer the reader to |30l. 

BiG-AMP’s complexity is in general dominated by ten 
matrix multiplies (of the form S'A) per iteration, al¬ 
though simplifications can be made in the case of Gaussian 
Py AzrJymt^mt) that reduce the complexity to three matrix 
multiplies per iteration lf30l . Furthermore, when BiG-AMP’s 
likelihood function and priors include unknown parameters 
FI, expectation-maximization (EM) methods can be used to 
learn them, as described in l30l . BiG-AMP was shown 01 
to yield excellent performance on matrix completion, robust 
PCA, and dictionary learning problems, and here we show that 
it performs very well on the NMF and HU problems as well. 


B. Augmented Observation Model 

We model the elements of the mth row of the additive noise 
matrix W in CO) as i.i.d zero-mean Gaussian with variance 
ijj m > 0. Thus, the BiG-AMP marginal likelihoods take the 
form py m | 2m (y mt \z mt ) = ; V'm)■ For now we treat 

i/j as known, but later (in Sec. IHI-Cb we describe how it and 
other model parameters can be learned from Y. 

Leveraging the zero-mean property of the noise, we first 
perform mean-removal on the observations Y. In particular. 
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we subtract the empirical mean 


A 

M = 


MT 


T M 

Y J Y,Vmt = 


t— 1 m— 1 


mt 1mY1t 


(5) 


from Y to obtain 


Y = Y-,i1 m 1 J t 

= (, S-ii1 m 1 T n )A + W , 

S. v ✓ 

4s 


( 6 ) 

(7) 


where 0 employed ® and 1 \A = 1^, the latter of which 
results from the simplex constraint on the columns of A. It 
can then be shown (see Appendix 0 that the elements of S_ 
in 0 are approximately zero-mean. 

To enforce the linear equality constraint 1 ] V A = l}., we 
augment the observation model 0 into the form 



( 8 ) 


For the augmented model ®, the likelihood function of Z = 
SA takes the form in ® with 


Py m \z m {Vmt\ Z mt) — \ _ . 

Zmt) 


■V \lJmti z mt^m) m —1, . . . , M 


m = M+ 1. 


(9) 


— hmt(Zmt) 


We note that, ignoring spectral and spatial coherence, the 
model ® is appropriate for the application of BiG-AMP, since 
the likelihood function py^iXlZ) is known (up to i/)) and 
separable, and since the elements in S and A can be treated 
as independent random variables with priors known up to a 
set of parameters, with those in S being approximately zero- 
mean. In the sequel, we describe how the model ® can be 
extended to capture spectral and spatial coherence. As we will 
see, this will be done through the introduction of additional 
variables that allow S and A to be treated as conditionally 
independent. 


C. Endmember Prior 

We desire a model that promotes spectral coherence in the 
endmembers, i.e., correlation among the (mean removed) spec¬ 
tral amplitudes {s mn }^f =1 of each material n. However, since 
BiG-AMP needs S mn to be independent, we cannot impose 
correlation on these variables directly. Instead, we introduce an 
auxiliary sequence of correlated amplitudes {e ran }^ =1 such 
that S mn are independent conditional on e mn . In particular, 

M N 

Ps\e{S]E) = nn Ps\e(§.mn | e mn) (10) 

ra=1n= 1 

Ps\ei§.mn\ e mn) d(§.mn e mn) i (H) 

v y v 

fmn\Smm &mn ) 


p(ein) p(e 2n \ein) p(e^ n \e 2 n) p(e 4n|e3n) 



Fig. 2. Factor graph for the stationary first-order Gauss-Markov chain used 
to model coherence in the spectrum of the n th endmember, shown here for 
M = 4 spectral bands. Incoming messages from BiG-AMP flow downward 
into the e mn nodes, and outgoing messages to BiG-AMP flow upward from 
the emn nodes. 


implying that e mn is merely a copy of S mn . To impart 
correlation within the auxiliary sequences {e mn }^f =1 , we 
model them as independent Gauss-Markov models 

N M 

Pe(B') = | p(ei n ) | p(e mn |e m _i ?n ), (12) 

n— 1 m=2 

V -v-' 

— Pe n ( e n) 

where e n = [e in ,..., e M n] T , e n = [e ln , e Mn ] T , and 

P(^ln) *V(e mn , Av n , CT n ) (13) 

p[c r nn |^m—l,n) *V^e mn , (1 l,n A ' 

(14) 

In (O-d), K n £ R. controls the mean of the ?ith process, er^ 
controls the variance, and rj n £ [0,1] controls the correlation. 
The resulting factor graph is illustrated in Fig. [2] 

We note that the model (IT3l) - (IT4l) does not explicitly enforce 
non-negativity in s mn because, for simplicity, we have omitted 
the constraint s mn > —//. Enforcement of s mn > —ft, could be 
accomplished by replacing the pdfs in (IT3l)- (IT4li with truncated 
Gaussian versions, but the computations required for inference 
would become much more tedious. In our experience, this 
tedium is not warranted: with practical HU datasetsflit suffices 
to enforce non-negativity in A and keep Y r; SA. 


D. Abundance Prior 


We desire a model that promotes both sparsity and spatial 
coherence in the abundances a n t- To accomplish the latter, 
we impose structure on the support of {a nt}t=i for each 
material n. For this purpose, we introduce the support variables 
d n t 6 {—1,1}, where d„t = —1 indicates that a,it is zero¬ 
valued, and d„t = 1 indicates that a ra t is non-zero with 
probability 1, which we will refer to as “active.” By modeling 
the abundances a nt as independent conditional on d„t, we 
comply with the independence assumptions of BiG-AMP. In 
particular, we assume that 


N T 


q Pa„|d„ {p, n t\d n t) 

05) 

a=l t =1 


f ^(pnt) dnt = 1 

d n t = 1 

(16) 

— Qnt{p , nti d n t ) 



where £„(•) denotes the pdf of a„t when active. Essentially, 
we employ a Bernoulli-^}-) distribution for the ;/th material. 


'Throughout our numerical experiments, the proposed inference method 
never produced a negative estimate of S mn . 
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Fig. 3. Factor graph for the Ising MRF used to model spatial coherence in 
the support of the n th abundance map, here for T = 3x3 pixels. Incoming 
messages from BiG-AMP flow diagonally upward into the d n t nodes, and 
outgoing messages to BiG-AMP flow diagonally downward from the d n t 
nodes. 

We then place a Markov random field (MRF) prior on the 
support of the nth material, d n = [d ra i, . • • ,d n T] T : 

N 

Pd(D) = Y[pd n (dn) (17) 

n= 1 

Pd n (d , n') tX exp | y ' | “ y ' findni tX n j d n t J , (18) 

V t=i V iev t J J 

where V t C {1 ,... ,T}\t denotes the neighbors of pixel t. 
Roughly speaking, larger /3 n yields higher spatial coherence 
and larger a n yields higher sparsity. For simplicity, we adopt 
a neighborhood structure corresponding to the classical Ising 
model l28l . as illustrated by the factor graph in Fig. 0 

As for the active abundances, we adopt a non-negative 
Gaussian mixture (NNGM) distribution for £„(•): 

L 

C n(a) (19) 

e=i 

where > 0 and ^nt = 1- l fl <ED- A/+ refers to the 

truncated Gaussian pdf 

(0 x < 0 

Af + (x-,e,<f)) = \ Af(x-,e,<j)) ^ n , ( 20 ) 

U c(0/v^) 

where 9 € R. is a location parameter (but not the mean), <j> > 0 
is a scale parameter (but not the variance), and <1\.(-) is the 
complimentary cdf of the Af( 0,1) distribution. In practice, we 
find that L = 3 mixture components suffice, and we used this 
value throughout our numerical experiments in Sec. II VI We 
use a NNGM prior based on its ability to faithfully model 
a wide range of distributions (including multi-modal ones) 
and the ease by which its parameters, ()f lf , 4>^ e }, can 

be accurately tuned using the EM method developed in | [40| 
and discussed in Sec. IIII-CI 

We note that the abundance model described in this section 
treats the abundance coefficients as correlated across pixels 
but statistically independent across materials. Meanwhile, the 
likelihood function described in Sec. III-BI enforces a sum-to- 
one constraint across materials at each pixel. These statistical 


structures are then merged in the posterior. An alternative ap¬ 
proach that allows an abundance prior with correlation across 
pixels and sum-to-one across materials recently appeared in 
[|49l . Implementing this approach in conjunction with AMP is 
an interesting topic for future research. 

III. The HUT-AMP Algorithm 

A. Message Passing and Turbo Inference 

Our overall goal is to jointly estimate the (correlated, non¬ 
negative) endmembers S and (structured sparse, simplex- 
constrained) abundances A from noisy observations Y of the 
bilinear form Z = SA. Using the mean-removed, augmented 
probabilistic models from Sec. [II] the joint pdf of all random 
variables can be factored as follows: 

p(Y,-S,A,E,D)_ 

= p(Y\S,A)p(S,E)p(A,D) (21) 

= PypiYISA)P s [E (S\E) Pe {E) Paid (A\D) Pd (D) (22) 

M+l T / N 

nn hmt £ SmnQ'nt 

m—1 t—1 \ n =1 

N / M 

X n U(SM +I ,r.-I)p. n (en) n fmn^Smni Cmn) 

n=l \ m= 1 

T 

X PdS d n) n 9nt(.Q"nti d n t) 

t=l 

yielding the factor graph in Fig. [I] Due to the cycles within 
the factor graph, exact inference is NP-hard li50) . and so we 
settle for approximate MMSE inference. 

To accomplish approximate MMSE inference, we apply 
a form of loopy belief propagation that is inspired by the 
“turbo decoding” approach used in modern communications 
receivers ED- In particular, after partitioning the overall 
factor graph into three subgraphs, as in Fig. [Q we alter¬ 
nate between message-passing within subgraphs and message¬ 
passing between subgraphs. In our case, BiG-AMP Eol is 
used for message-passing within the bilinear subgraph and 
standard methods from |28l , l29l are used for message-passing 
within the other two subgraphs, which involve N Gauss- 
Markov chains and N binary MRFs, respectively. Overall, our 
proposed approach can be interpreted as a bilinear extension 
of the “turbo AMP” approach first proposed in l34l . 

B. Messaging Between Subgraphs 

For a detailed description of the message passing within 
the Gauss-Markov, MRF, and BiG-AMP subgraphs, we refer 
interested readers to l28l . ||29| , and 11301 . respectively. We now 
describe the message passing between subgraphs, which relies 
on the sum-product algorithm (SPA) (43). In our implemen¬ 
tation of the SPA, we assume that all messages are scaled to 
form valid pdfs (in the case of continuous random variables) 
or probability mass functions (pmfs) (in the case of discrete 
random variables), and we use A^(-) to represent the message 
passed from node b to node c. 

As described in lf43l , the SPA message flowing out of a 
variable node along a given edge equals the (scaled) product 











6 


of messages flowing into that node along its other edges. 
Meanwhile, the SPA message flowing out of a factor node 
along a given edge equals the (scaled) integral of the product 
of all incoming messages times the factor associated with that 
node. Finally, the SPA approximates the posterior of a given 
random variable as the (scaled) product of messages flowing 
into that random variable. 

As discussed in Sec. III-AI a key property of BiG-AMP is 
that certain messages within its sub-graph are approximated 
as Gaussian. In particular, 

A 7 ™” (a) = A f(s; q m „, (24) 

A gTt( a ) (25) 

where the quantities q mn ,i'^ nn ,r' n t,i l P l t are computed during 
the final iteration of BiG-AMP. Thus, the SPA approximated 
posteriors on S mn and a nt take the form 

Ps mn |q mn (® I qmn ; vf ln ) ^ A imn (— V^" 1 (—> 9™™) v run ) (26) 

Pa nt |r nt( a I v nt) {a)Af(a; r nt , < t ), (27) 

where Ag mn (s) and A|”*(a) can be interpreted as priors, 
A/"(s; $"mn, and A/"(a; r nt , u r nt ) can be interpreted as like¬ 
lihoods, and (f26l) and (127b can be interpreted as Bayes rule. 
We will use these properties in the sequel. 

First, we discuss the message-passing between the bilinear 
sub-graph and spectral-coherence sub-graph in Fig. Q] Given 
dm . (l24l) . and the construction of the factor graph in Fig. [j] 
the SPA implies that 

A eZn ( e ) cx J fmn(s, e ) U) ds (28) 

= A f(e-,q mn ,u^ n ). (29) 

The messages in ( l29l > are used as inputs to the Gauss- 
Markov inference procedure. By construction, the outputs of 
the Gauss-Markov inference procedure will also be Gaussian 
beliefs. Denoting their means and variances by ()f nn and 
respectively, we have that 


(e) oc AT(e; 6£„, <?4J 

(30) 

(s) = J f mn (s , e) A e f Z2 (e) de 

(31) 


(32) 


When BiG-AMP is subsequently called for inference on the 
bilinear sub-graph, (l32l > is inserted into ( l26t . i.e., As m "(-) acts 
as the prior on S mn . 

Next we discuss the message-passing between the bilinear 
sub-graph and the spatial-coherence sub-graph in Fig. □ The 
SPA, together with the construction of the factor graph in 
Fig. [Q imply 


A Z 


(d) = 


19nt (a, d) A a »‘ (a) da 


Ed'=±i 19nt (a, d') A a gl\ (a) da : 

Given (IT6l > and (l25l >. we find that 


d e ±1. (33) 


J g nt (a,d) A®"* (a) da 
_ {N(f)\? n t,v r nt )da d = -1 

\/Cn(a)A/'(a;f„ t ,i/^ t )da d= 1 


(34) 


which implies 

a s::(d=+i) = (i+ 


AA(0 -,rnt,v r n 


f C,n{a)Af{a-,? nt ,v r nt ) 


A£:(d=-i) = i-Ag::(d=+i), 


(35a) 

(35b) 


where the fraction in (135al) is BiG-AMP’s approximation of 
the likelihood ratio PY\d nt ( Y \ - l )IVY\6 nt { Y \ + !)• 

The Bernoulli beliefs from (l35l > are used as inputs to the 
MRF-based support-inference procedure. The outputs of the 
MRF inference procedure will also be Bernoulli beliefs of the 
form 


Ag”*(d = + 1 ) = 7r nt (36a) 

A»::(d=-l) = l-7T n t (36b) 

for some ir nt 6 (0,1). The SPA and (fl6l > then imply that 

A I:)(«)oc 9nt(a,d)A 6 g " t t (d) (37) 

d=±l 

= (1-t T nt )5(a) + n nt ( n (a) (38) 


for ( n (-) defined in ( | 1 9l i. When BiG-AMP is subsequently 
called for inference on the bilinear sub-graph, (l38l > is inserted 
into (E3, i.e., AaTt(') acts as t ^ le P r ior on & nt . 


C. EM Learning of the Prior Parameters 
In practice, we desire that the parameters 

Cl = {iO n £, 9 ni , (f) n p }vn£; \jlm Km C*-m Pn}\/n | (39) 

used for the assumed likelihood p y \z mt (ymt \•)> NNGM 
abundance prior Cn(‘)> Gauss-Markov chain p en (•), and binary 
MRF Pd n (-) are well tuned. With this in mind, we propose 
an expectation-maximization (EM) 11521 procedure to tune Cl, 
similar to that used for the GAMP-based sparse-reconstruction 
algorithms in ll53l and ll40l . 

To tune Cl, the EM algorithm |[52l iterates 

Cl i+1 = axgmaxE {In p{E, A, D,Y-, Cl) IFjfi 4 } (40) 

ci 

with the goal of increasing a lower bound on the true like¬ 
lihood p(Y; Cl) at each EM-iteration i. In our case, the true 
posterior distribution used to evaluate the expectation in d40l) 
is NP-hard to compute, and so we use the SPA-approximated 
posteriors p^y(E\Y)_ « n,„,n A fe:(e mn )A^"(e mn ) from 
©-®, p^(D\Y)_OC Un,t A Z( d nt) A t'Mnt) fr ° m 
®-®, and p^y(A\Y) oc ]!„,* A ll\ ( a r»t)A^‘ (a nt ) from 
(l25l > and ( I38| i. Furthermore, since it is difficult to perform the 
maximization in (l40l jointly, we maximize Cl one component 
at a time (while holding the others fixed), which is the well 
known “incremental” variant of EM m. 

The resulting EM-update expressions for the noise and 
NNGM parameters ip, 6^, can be found in l40l . and 
those for the Gauss-Markov chain parameters r] n ,KmCr^ l can 
be found in l39l . They are all computed in closed-form using 
readily available quantities, and thus do not add significantly 
to the complexity of HUT-AMP. The update procedure for the 
binary MRF parameters a n ,/3 n is described in f38j and uses 
gradient descent. Since a small number of gradient-descent 
iterations suffice, this latter procedure does not significantly 
increase the complexity of HUT-AMP. 
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D. EM Initialization 

Since the EM algorithm may converge to a local maximum 
of the likelihood, care must be taken when initializing the 
EM-learned parameters. Below, we propose an initialization 
strategy for HUT-AMP that, based on our empirical experi¬ 
ence, seems to work well. 

We first initialize the endmembers S. For this, we found it 
effective to use an off-the-shelf EE algorithm like VC A 0 
or FSNMfH [ 10] to recover S . Then, as described in 0, 
we subtract the observation mean u from S to obtain the 
initialization S_ . 

With the aid of 5 , we next run BiG-AMP under 

1 ) the trivial endmember prior 

A s™”(s) = (41) 

which essentially fixes the endmembers at S_ , 

2) the agnostic NNGM abundance initialization from l40l : 


£=1 

(42) 

with i set at the best fit to a uniform 

distribution on the interval [ 0 , 1 ] and 7 r° t = 4, and 

3) the noise variance initialization from ED: 


'dr = 
r'm 


\m 


(SNR^ + 1 )MT 


Vra, 


(43) 


where, without any prior knowledge of the true 
SNR m = E{|z mt | 2 }/^ m , we suggest SNR^ = 10 dB. 

By running BiG-AMP under these settings, we initialize 
the messages and Afrom (l24t - (l25l ) and we 

also obtain an initial estimate of A from the mean oi^the 
approximate posterior 427]!, which we shall refer to as A . 

Finally, we initialize the remaining parameters in il. Starting 
with the spectral coherence parameters, we set the mean 
and variance (cx 2 ) 0 at the empirical mean and variance, 
respectively, of the elements in the nth column of S_ . Then, 
we initialize the correlation rj n as suggested in (39) . i.e., 




o 

m 



2 

F 


= 1 -V 

M - 1 4A 


M 1 1 v T y 


.m—m+1 1 


1 *—' ,„0 II 4 

m=l Em K 1 


0 M 2 

F 


(44) 

for n = 1,... ,N, (45) 


where y J denotes the mth row of Y_. Lastly, we initialize 
the spatial coherence parameters as suggested in l38l . i.e., 
/3° = 0.4 and a° = 0.4, since f3~8l shows these values to 
work well over a wide operating range. 


2 With FSNMF (which was used for all of the experiments in Sec. Hvl we 
found that it helped to post-process the observations to reduce the effects of 
noise. For this, we used the standard PCA-based denoising approach described 
in 0: the signal subspace was estimated from the left singular vectors of Y 
after row-wise mean-removal, and the FSNMF-estimated endmembers were 
projected onto the signal subspace. 


Definitions: 



A f. = (A /“"(-)}Vmn 


— {Aedl (')lVmn 

A g = {4s™ (-)}Vmn 

Af 

— (A ^(.)} Vm „ 

A l ± {A&*(-)} V nt 

A? 

= {A UKOlVnt 

a d = { A d 9 ::(-)}v nt 

Ag 

— {Ag"* (9}Vni 


Initialize A^,, A^, and f2° as described in Sec. EEd] 
for * = 1,2,3,... do 

F g4) 

and (ID 

dD and GD 


convert Ap to A l 
convert A^ to A^ 


via i 
via i 


Ar = GaussMarkov(A£ , ST) 

= MRF(Ag\lT) 

convert A^ to Ag via GD and (32) 
convert A^ to Avia CD and CD 
ft* = EM(Ag , Af.. Ap , , A®, Ag, f2 i_1 ) 

[A-§,, Ag] = BiGAMP(Ag , A®, Cl i ) 
end for 


TABLE I 

HUT-AMP PSEUDOCODE FOR FIXED NUMBER OF MATERIALS N. 


E. HUT-AMP Summary 

We now describe the scheduling of turbo-messaging and 
EM-tuning steps, which together constitute the HUT-AMP 
algorithm. Essentially, we elect to perform one EM update 
per turbo iteration, yielding the steps tabulated in Table [0 As 
previously mentioned, the “BiGAMP” operation iterates the 
BiG-AMP algorithm to convergence as described in |30], the 
“GaussMarkov” operation performs standard Gauss-Markov 
inference as described in (28), and the “MRF” operation 
performs MRF inference via the belief-propagation method 
described in (29) . 


F. Selection of Model Order N 

In practice, the number of materials N present in a scene 
may be unknown. Previous approaches such as the hyperspec- 
tral signal subspace identification by minimum error (HySime) 
11551 . and a Neyman-Pearson detection theory-based thresh¬ 
olding method (HFC) Il56l directly address the problem of 
estimating the number of materials N. 

As an alternative, we apply a standard penalized log- 
likelihood maximization (42) method to estimate N from the 
observed data Y. Specifically, we aim to solve 

N = argmax21npy |z (Y|5j V A 7 v;^ML) - 7(^0, (46) 

N 

where S_ N and Aj v are the estimates of the mean-removed 
endmembers and abundances returned from V-material HUT- 
AMP, ip ML is the ML estimate of the noise variance, and 7 (-) 
is a penalty term. As recommended in [48), we choose 7 (-) in 
accordance with the small-sample-corrected Akaike informa¬ 
tion criterion (AICc) (42), i.e., 7 (N) = 2 
where MT is the number of scalar observations in Y and 
n(N) is the number of scalar degrees-of-freedom (DoF) 
in our model, which depends on N. In particular, n(N) 
comprises MN DoF from S, (N — 1)T DoF from A, and 
5 N + 2 NL + N(L — 1) + M DoF from $2. Plugging the 
standard form of the ML estimate of if (see, e.g., (42l eq. 













A 


(7)]) into ( l46l ). we obtain 

2MTn{N) 
MT-n{N)- 1' 
(47) 

To solve the maximization in (l47l >. we first run TV = 2 
HUT-AMP to completion and compute the penalized log- 
likelihood. We then increment TV by 1, and compute the 
penalized log-likelihood again. If it increases, we increment 
TV by 1 and repeat the procedure. Once the penalized log- 
likelihood decreases, we stop the procedure and select the 
previous model order TV, which is the local maximizer of the 
penalized log-likelihood. We refer to the resulting procedure as 
“HUT-AMP with model-order selection’' (HUT-AMP-MOS). 

We also note that a similar model-order selection strategy 
can be implemented to tune the number of NNGM components 
L used in ( | 1 9k and we refer interested readers to 153) for more 
details. We note, however, that the fixed choice L = 3 was 
sufficient to yield the excellent results in Sec. [TV] 

IV. Numerical Results 

In this section, we report the results of several experiments 
that we conducted to characterize the performance of our 
proposed methods on both synthetic and real-world datasets. 

In these experiments, we compared the endmembers S re¬ 
covered from our proposed HUT-AMP and HUT-AMP-MOS0 
unmixing algorithms to those recovered by the Bayesian 
unmixing algorithm SCU ll26l : the sparse NMF techniques 
L±NMF m and SDSNMF d; and the endmember extrac¬ 
tion (EE) algorithms VCA Q, FSNMF ifTOl, and MVSA ifTTil . 

We also compared the abundances A recovered by our pro¬ 
posed HUT-AMP and HUT-AMP-MOS unmixing algorithms 
to those recovered by SCU and SDSNMF, as well as those 
recovered by FCLS (O (implemented via Matlab’s lsqlin) 
and SUnSAL-TV Iil6l using the endmember estimates pro¬ 
duced by VCA, FSNMF, and MVSA. We note that SCU, 
SDSNMF, and SUnSAF-TV all exploit spatial coherence, and 
that SDSNMF is in fact LiNMF with additional mechanisms 
to exploit spatial coherence. 

In all cases, algorithms were run using their authors’ imple¬ 
mentation and suggested default settings, unless noted other¬ 
wise. The only exceptions are SDSNMF and LiNMF, which 
we implemented in MATLAB since their authors declined to 
provide source code. All algorithms (with the exception of 
HUT-AMP-MOS) were supplied the true number of materials 
TV in each experiment. For SUnSAL-TV, the regularization 
weights for the t\ and TV norms were hand-tuned, because 
cross-validation tuning was too computationally expensive 
given the sizes of the datasets. For FSNMF, we used the PCA 
post-processing described in footnote [2] to reduce the effects 
of measurement noise, since this greatly improved its mean- 
squared estimation error. 

A. Pixel Purity versus Abundance Sparsity 

Our first experiment aims to assess EE performance as 
a function of pixel purity and abundance sparsity. Our mo¬ 
tivation stems from the fact that the proposed HUT-AMP 

3 Matlab code can be found at http://www.ece.osu.edu/~schniter/HUTAMP 


TV = arg max —MT In 

N \ 




n-H-nWf 


MT 



Fig. 4. Illustration of the non-negative endmember matrix S and the K- 
sparse P-pure abundance matrix A for the first experiment. 


algorithm aims to exploit sparsity in the columns of the 
abundance matrix A, while classical EE techniques like VCA 
and FSNMF aim to exploit the presence of pure pixels, 
recalling the discussion in Sec. Q] Thus, we are interested in 
seeing how these contrasting approaches fare under varying 
combinations of pixel purity and abundance sparsity. We also 
compare against the minimum-volume-simplex approach from 
GD, which is an alternative to both pixel purity and abundance 
sparsity. 

We first constructed synthetic data consisting of M = 100 
spectral bands, T =115 spatial pixels, and TV = 10 materials. 
The endmember matrix S € R^ xAr was drawn i.i.d such that 
S m „ ~ 7V+(0.5, 0.05). The abundance matrix A £ R^ xT was 
generated as shown in Fig. Q] where P of the columns of A 
were assigned (uniformly at random) to be pure pixels, and 
the remaining columns were drawn /\-sparse on the simplex. 
In particular, for each of these latter columns, the support was 
drawn uniformly at random, and the non-zero values {a k 
were drawn from a Dirichlet distribution, i.e.. 


f r(o <k) 

P(Qa,---,a K _ i) = < J (q)K 


n K CL 

fc=l Hfc 


a-1 
■k 1 


£ [ 0 > 1 ] 


0 


p(i 


■K\—li ■ 


-K 


-l) = <5(1 - SLi - 


else 

Ik)> 


(48a) 

(48b) 


where r(-) denotes the gamma function, with concentration 
parameter a = 1. Finally, the observation matrix Y was 
created by adding white Gaussian noise W to Z = SA, 
where the noise variance ip was adjusted to achieve SNR = 
M T \\Zf F /iP = 80 dB. 

Figure^ shows empirical success probability averaged over 
R = 100 realizations, as a function of pixel purity P and 
sparsity K, for the HUT-AMP, MVSA, VCA, and LiNMF al- 
gorithmsQlt does not show FSNMF since its performance was 
indistinguishable from VCA’s performance. Here, a recovery 
was considered successful if NMSEg = ||S — <S , ||^/||S'||^ < 
-40 dB. As seen in Fig. 0c) and Fig. 01), VCA and FSNMF 
were only successful for the K = 1 and P = 10 cases, i.e., the 
pure-pixel cases. LiNMF did slightly better, with successful 
recovery for K <2. HUT-AMP, on the other hand, was able 
to successfully recover the endmembers for K < 6-sparse 
abundances, even when there was only P = 1 pure-pixels 
available. We attribute HUT-AMP’s improved performance to 
its exploitation of sparsity rather than pure pixels (as with 


4 Since our experimental findings into sparsity-versus-purity would be 
biased if the algorithms under test used different approaches to the exploitation 
of spatial and/or spectral coherence, we turn off the coherence-exploiting 
mechanisms in HUT-AMP and SDSNMF (reducing the latter to L^NMF) and 
compare to other algorithms that do not exploit spatial or spectral coherence: 
VCA and MVSA. " 
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(a) HUT-AMP 



Pixel purity P 
(c) VC A 



2 4 6 8 10 


Pixel purity P 


(b) MVS A 



2 4 6 8 10 


Pixel purity P 


(d) L±NMF 



2 4 6 8 10 


Pixel purity P 


(a) HUT-AMP (b) MVSA 



Pixel purity P Pixel purity P 


(c) VC A (d) LiNMF 



Pixel purity P Pixel purity P 


Fig. 5. First experiment: Average success rate for near-perfect recovery 
of i.i.d endmembers S and X-sparse and P-pure abundances A using (a) 
HUT-AMP, (b) MVSA, (c) VCA, and (d) LiNMF. 


VCA and FSNMF), and its ability to accurately learn the 
underlying sparsity rate. Also, we conjecture that sparsity (i.e., 
K > 1 and P < TV ) is more important in practice, since 
the spatial resolution of the hyperspectral sensors may not 
guarantee pixel-purity for all materials, while sparse abun¬ 
dances (i.e., K -C TV) are more likely to hold. Finally, we 
note that, although MVSA performed remarkably well in this 
experiment, it performed relatively poorly for the experiments 
in Sec. IW-Bl through Sec. UV^Dl 

Next, we repeat the previous experiment at SNR = 60 dB 
with S randomly selected from the USGS Digital Spectral 
Library splibO6a0 which contains laboratory-measured re¬ 
flectance values for various materials over M = 224 spectral 
bands. In particular, for each Monte-Carlo realization we 
randomly select TV = 10 endmembers from the library such 
that mini^j SAD(si, Sj) > 15 degrees, for spectral angle 
distance 

SAD (si,3j) = arccos ( it—) (49) 

V 11 112 11 Sj 11 2 / 

Figure [6] shows the empirical success probability for the HUT- 
AMP, MVSA, VCA, and L^NMF algorithms. Although HUT- 
AMP’s performance with USGS endmembers is not as good 
as with i.i.d. endmembers, it still outperformed VCA and 
LiNMF. As before, MVSA has the best performance. 

Finally, we perform a variation on the previous experiment 
that again uses randomly selected USGS endmembers. But 
rather than using pure and/or /f-sparse abundance vectors, it 
uses fully mixed abundances whose TV coefficients were gener¬ 
ated from a Dirichlet distribution with concentration parameter 
a (recall (l48l) l. Recall that larger values of a correspond to 
more dense mixing. Figure [7] reports the average NMSEs 
of HUT-AMP, MVSA, VCA, and LiNMF versus both the 

5 See http://speclab.cr.usgs.gov/spectral.lib06/ds231/ 


Fig. 6. Second experiment: Average success rate for near-perfect recovery 
of USGS endmembers S and X-sparse and P-pure abundances A using (a) 
HUT-AMP, (b) MVSA, (c) VCA, and (d) LiNMF. 


(a) HUT-AMP 


(b) MVSA 



# materials N 
(c) VCA 


# materials N 
(d) LiNMF 



# materials N 


# materials N 


Fig. 7. Third experiment: Average NMSEg for recovery of TV USGS 
endmembers S' with abundances A drawn from Dirichlet distribution with 
concentration a using (a) HUT-AMP, (b) MVSA, (c) VCA, and (d) L^NMF. 


number of materials, TV, and the concentration parameter, a, 
at SNR = 40 dB. The figure shows that HUT-AMP, VCA, 
and L-iNMF gave similar performance overall, with small 
advantages to HUT-AMP when N < 8 and a < 10 -13 / 8 . 
Relative to the other algorithms, MVSA tolerated higher values 
of a, but was more sensitive to larger numbers of materials, 
TV, when a was small. 


B. Pure-Pixel Synthetic Abundances 

The second experiment uses synthetic pure-pixel abun¬ 
dances A with endmembers S chosen from the USGS Digital 
Spectral Library. To construct the data, we partitioned a 

























































10 



Fig. 8. False-color image of the noiseless measurements Z used for the 
second experiment. Since the pixels are pure, each strip shows the false color 
of one of the N = 5 materials. They are, in order from left to right: Grossular, 
Alunite, wxl Kaolinite, Hydroxyl-Apatite, and Amphibole. 


-©- HUT-AMP 
-0-SCU 

—B— SDSNMF 
-~k- FSNMF+FCLS 

FSNMF+SUnSAL-TV' 
-X- VCA+FCLS 
- X - VCA+SUnSAL-TV 
MVSA+FCLS 
-J$c- MVSA+SUnSAL-TV 



SNR [dB] 


Fig. 9. NMSEg vs. SNR for the synthetic pure pixel dataset. 


scene of T = 50 x 50 pixels into N = 5 equally sized 
vertical strips, each containing a single pure material. We then 
selected endmembers corresponding to the materials Grossular, 
Alunite, well crystallized (wxl) Kaolinite, Hydroxyl-Apatite, 
and Amphibole, noting that similar results were obtained 
in experiments we conducted with other materials. Figure [8] 
shows a false-color image constructed from the noiseless 
measurements Z. We then vary the SNR on a grid from 15 dB 
to 35dB by adding white Gaussian noise. 

Averaging over r = 50 realizations. Figures HI and [TO] 
show the normalized mean-squared error of the estimated 
endmembers and abundances, respectively (i.e., NMSEs and 
NMSEa = ||A — A||pi/||A||^), while Fig. [TT] shows the 
average runtime versus SNR. For this pure-pixel dataset, 
these figures show HUT-AMP dominating the other algorithms 
in both endmember and abundance estimation accuracy at 
all SNRs. In particular, HUT-AMP outperformed the best 
competing techniques by 4 to 12 dB in NMSEs and as much 
as 90 dB in NMSE^. We note that the biggest gains in 
NMSEa occurred when SNR > 24 dB. 

We attribute HUT-AMP’s excellent NMSE to several fac¬ 
tors. First, it has the ability to jointly estimate endmembers and 
abundances, to exploit spectral coherence in the endmembers, 
and to exploit both spatial coherence and sparsity in the 
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Fig. 10. NMSEa vs. SNR for the synthetic pure pixel dataset. 
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Fig. 11. Runtime vs. SNR for the synthetic pure pixel dataset. 


abundances (of which there is plenty in this experiment). 
Furthermore, due to the presence of pure-pixels throughout 
the scene, the “active” distribution £„(•) in ( fl9t is simply a 
Bernoulli distribution, which HUT-AMP is able to learn (via 
EM) and exploit (via BiG-AMP) for improved performance. 

Figure [TT] shows that the runtime of HUT-AMP was ap¬ 
proximately 3 times slower than the EE-and-inversion tech¬ 
niques, approximately 4 times faster than the spatial-coherence 
exploiting SDSNMF algorithm, and approximately 200 times 
faster than that of the Bayesian SCU algorithm. We conjecture 
that the relatively slow runtime of SCU is due to its use of 
Gibbs sampling. 

Although not shown in the above figures, we also ran HUT- 
AMP-MOS on this dataset at SNR = 30 dB. The result 
was that HUT-AMP-MOS correctly estimated the number 
of materials (i.e., N = 5) on every realization and thus 
gave identical NMSEg and NMSE/i as HUT-AMP. The total 
runtime of HUT-AMP-MOS at this SNR was 94.27 seconds, 
which was about 9 times slower than HUT-AMP but still 75 
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times faster than SCU. 

C. SHARE 2012 Avon Dataset 

Next, we evaluated algorithm performance on the 
SHARE 2012 Avon dataseO ED, which uses M = 360 
spectral bands, corresponding to wavelengths between 400 and 
2450 nm, over a large rural area. To do this, we first cropped 
down the full image to the scene shown in Fig. [121 which 
is known to consist of N = 4 materials: grass, dry sand, 
black felt, and white TyVek f58ll . This scene was explicitly 
constructed for use in hyperspectral unmixing experiments, 
as efforts were made to ensure that the vast majority of the 
pixels were pure. Also, the data was collected on a nearly 
cloudless day, implying that shadowingeffects were minimal. 
To construct ground-truth endmembersO we averaged a 4 x 4 
pixel grid of the received spectra in a “pure” region for 
each material. We then computed SAD between each ground- 
truth endmember s n and the estimate s n produced by each 
algorithm. 

Table QI] shows median SAD over 50 realizations, using 
the original dataset and one with white Gaussian noise added 
to achieve SNR = 25 dB. In the noiseless case, the table 
shows that HUT-AMP recovered the grass and white TyVek 
materials with the highest accuracy and recovered the dry 
sand and black felt materials with the second highest accuracy. 
Meanwhile, in the noisy case, HUT-AMP recovered the TyVek 
material with the highest accuracy (in a tie with SDSNMF) 
and recovered the dry sand and black felt materials with the 
second highest accuracy. Fooking at the material-averaged 
SAD scores in the table, it is evident that the accuracies 
achieved by HUT-AMP are close to those attained by the most 
accurate algorithm, SDSNMF, and significantly better than 
those attained by any of the competing algorithms, in both the 
noiseless and noisy cases. Although SDSNMF offers slightly 
more accurate endmember recoveries. Fig. M shows that its 
runtime is 44 times slower than that of HUT-AMP. Therefore 
we conclude that HUT-AMP offers an excellent combination 
of endmember recovery accuracy and runtime. 

For visual comparison. Fig. [13] shows an example of the 
extracted and ground-truth endmembers in the noiseless case. 
The figure shows HUT-AMP’s estimates closely matching 
the ground-truth for all materials; by contrast, MVS A is 
mismatched in the case of grass, MVS A and FSNMF are 
mismatched in the case of dry sand, MVSA, VCA, and 
FSNMF, are mismatched in the case of black felt, and MVSA, 
VCA, SCU, FSNMF, and MVSA are all mismatched in the 
case of white TyVek. Figure fl3l reveals that MVSA does not 
always yield non-negative endmembers estimates, which may 
account for its relatively poor performance in all but our first 
experiment from Sec. II V-AI 

As another visual comparison. Fig. fl4l shows an example of 
the recovered abundance maps in the noiseless case. We reason 

6 The SHARE 2012 Avon dataset can be obtained from 
http://www.rit.edu/cos/share2012/ 

7 In practical HU data, ground truth is difficult to obtain, since lab-measured 
reflectivity can differ dramatically from received radiance at the sensor. In this 
experiment, we circumvent these problems by exploiting the known purity of 
the pixels and by minimizing noise effects through averaging. 



Fig. 12. False-color image of the cropped scene of the SHARE 2012 dataset 
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11.72 

2.17 

4.30 

CN 

FSNMF 

1.58 

14.47 

4.39 

1.65 

5.52 

cc 

MVSA 

4.52 

11.16 

48.05 

1.58 

16.33 

z 

CO 

SCU 

1.82 

2.03 

8.24 

1.73 

3.47 


SDSNMF 

1.99 

0.81 

3.37 

0.44 

1.65 


TABLE II 

Median spectral angle distance (in degrees) between 
RECOVERED AND GROUND-TRUTH ENDMEMBERS IN THE SHARE 2012 
EXPERIMENT. 


that the best recoveries are the ones that are the most pure 
within the green, tan, black, and white regions of Fig.[l2l given 
that great care was taken during data collection to keep each 
region occupied by a single material. Figure fl4l shows that, in 
the case of dry sand and black felt, the abundances recovered 
by HUT-AMP were the most pure and, in the case of grass and 
Tyvek, the abundances recovered by HUT-AMP were among 
the most pure. The other Bayesian approach, SCU, yielded 
abundance estimates with much less purity, and we conjecture 
that was due to its priors being less well-matched to this 
highly sparse scene. Meanwhile, SUnSAF-TV (using both EE 
techniques) failed to recover the black felt material, which we 
attribute to its lack of a sum-to-one constraint. 

Average runtimes are also reported next to each algorithm 
in Fig. M There we see that HUT-AMP’s runtime was 4- 
9 times slower than the EE-and-inversion techniques but 44 
times faster than SDSNMF and 166 times faster than SCU, 
the other Bayesian technique. 

We also ran HUT-AMP-MOS on the SHARE 2012 dataset 
and found that it correctly estimated the presence of N = 4 
materials, thus yielding identical recovery performance to 
HUT-AMP. HUT-AMP-MOS’s runtime was 36.54 seconds, 
which was 2.5 times slower than HUT-AMP but still 67 times 
faster than SCU. 
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Fig. 13. Examples of recovered and ground-truth endmembers for the 
SHARE 2012 experiment. 


D. AVIRIS Cuprite Dataset 

Our final numerical experiment was performed on the 
well known AVIRIS Cuprite dataset. Although the original 
dataset consisted of M = 224 spectral bands, ranging from 
0.4 to 2.5 pm, we aimed to replicate the setup in |26| , 
which removed bands 1-10, 108-113, 153-168, and 223-224 to 
avoid water-absorption effects, resulting in M = 189 spectral 
measurements per pixel. And, like l26l . we considered only 
the 80 x 80 pixel scene identified by the black square in 
Fig. Q3] and we assumed N = 5 materials. According to the 
tricorder classification map in Fig. El this scene contains 
the materials Montmorillonite, Alunite, well crystallized (wxl) 
Kaolinite, and partially crystallized (pxl) Kaolinite. Although 
|26] conjectured that this area also contains Sphene, none of 
the algorithms produced endmember estimates that were close 
to Sphene, and is Sphene is not listed in Fig. [15] Thus, we did 
not consider Sphene as a ground-truth material. Also, like in 
[26], we considered both noiseless and white-Gaussian-noise 
corrupted measurements (at SNR = 30 dB). 

Table Hill shows the median SAD achieved during endmem¬ 
ber extraction over 50 realizations. From the table, we see that, 
in the noiseless case, HUT-AMP achieved the best material- 
averaged SAD as well as the best SAD for two specific 
materials. In the noisy case, HUT-AMP achieved the second- 
best material-averaged SAD as well as the best SAD for one 


(a) HUT-AMP (average runtime = 14.68 sec): 



(b) VCA+FCLS (average runtime = 2.50 sec): 



(c) VCA+SUnSAL-TV (average runtime = 4.13 sec): 



(d) FSNMF+FCLS (average runtime = 1.67 sec): 



(e) FSNMF+SUnSAL-TV (average runtime = 3.36 sec): 
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(f) MVSA+FCLS (average runtime = 1.82 sec): 
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(g) MVSA+SUnSAL-TV (average runtime = 4.08 sec): 


0 
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(h) SCU (average runtime = 2438 sec): 



(i) SDSNMF (average runtime = 642.8 sec): 



Fig. 14. Average runtimes and examples of recovered abundance maps for 
the SHARE 2012 experiment. From left to right, the materials are: grass, dry 
sand, black felt, and white TyVek. 


material. Meanwhile, the SADs produced by VCA, FSNMF, 
and SDSNMF were of a similar magnitude, while those 
produced by SCU and MVS A were noticeably larger. These 
SAD values should be interpreted with caution, however, 
since i) the ground-truth endmembers are laboratory-measured 
reflectance spectra from the 2006 USGS library as ground- 
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Fig. 15. Mineral classification mapping of the Cuprite Dataset using the 
Tricorder 3.3 product ns. We used the scene cropped by the black rectangle. 




SAD [degrees] 

HUT-AMP 

SCU 

VCA 

FSNMF 

MVS A 

SDSNMF 

noiseless 

Montmor. 

3.42 

3.95 

3.91 

3.54 

6.03 

3.47 

wxl Kaolinite 

10.23 

13.14 

10.45 

10.86 

15.42 

11.46 

pxl Kaolinite 

9.10 

11.45 

9.22 

9.38 

10.51 

9.09 

Alunite 

7.27 

6.62 

6.55 

6.40 

7.11 

7.87 

Average 

7.50 

9.12 

7.53 

7.55 

9.77 

7.97 

SNR = 30 dB 

Montmor. 

3.53 

3.80 

3.79 

3.57 

5.64 

3.48 

wxl Kaolinite 

10.72 

12.46 

10.62 

12.93 

15.59 

11.20 

pxl Kaolinite 

9.10 

11.47 

9.32 

10.49 

11.55 

9.43 

Alunite 

7.45 

7.94 

6.60 

6.39 

7.16 

6.67 

Average 

7.70 

8.92 

7.59 

8.34 

9.98 

7.70 


TABLE III 

Median spectral angle distance (in degrees) for the Cuprite 

EXPERIMENT. 


truth, whereas the Cuprite dataset itself uses reflectance units 
obtained via atmospheric correction of radiance dataf] and ii) 
it is not clear exactly which materials are truly present in the 
scene. The fact that the SADs reported here are so much larger 
than those reported in our SHARE experiment suggests that 
the Cuprite ground-truth may not be fully accurate. 

For visual comparison, we plot examples of the abundance 
maps recovered in the noiseless experiment in Fig. [16] The 
figure shows that the abundance maps returned by HUT-AMP, 
SDSNMF, FSNMF+FCFS, VCA+FCFS, FSNMF+SUnSAL- 
TV, and VCA+SUnSAF-TV have the highest contrast, sug¬ 
gesting that if certain pixels are truly pure then these al¬ 
gorithms are accurately estimating those pixels. The maps 

8 The reflectance and radiance versions of the Cuprite dataset can be found 
at http://aviris.jpl.nasa.gov/html/aviris.freedata.html 


produced by SUnSAF-TV appear more “blurred,” probably 
as an artifact of TV regularization. The abundances returned 
by SCU, MVSA+FCFS, and MVSA+SUnSAF-TV were of 
much lower contrast and suggest different material placements 
than the maps generated by the other algorithms. For example, 
SCU suggests a significant wxl-Kaolin presence throughout 
the lower half of the scene, in contrast to other algorithms. 
However, Table [III] shows that SCU gave the worst SAD for 
wxl-Kaolin. 

Figure [16] also shows the total runtimes of the various 
algorithms. There we see that HUT-AMP was 6-8 times slower 
than the typical EE-and-inversion approach, but more than 80 
times faster than SCU and more than 200 times faster than 
SDSNMF. 

We also ran HUT-AMP-MOS on the Cuprite data and found 
that, in both the noiseless and noisy cases, it estimated the 
presence of TV = 5 materials, and thus returned identical 
estimates to HUT-AMP Meanwhile, HUT-AMP-MOS gave an 
average runtime of 191.49 seconds, which was 30 times faster 
than SCU and 75 times faster than SDSNMF. 

V. Conclusions 

In this paper, we proposed a novel empirical-Bayesian 
hyperspectral-unmixing algorithm that jointly estimates end- 
members and abundance maps while exploiting the practical 
features of spectral and spatial coherence, as well as abundance 
sparsity. Inference is performed using the “turbo” approach 
proposed in ll34l , which breaks up the factor graph into 
three subgraphs, performs (loopy) BP individually on each 
subgraph, and then exchanges beliefs between subgraphs. For 
the spectral and spatial coherence subgraphs, standard Gauss- 
Markov and discrete-Markov methods [(28l . |]29| , respectively, 
are used, while for the non-negative bilinear-mixing subgraph, 
the recently proposed BiG-AMP method from f30l is used, 
which exploits the approximate message passing framework 
from El, El- Furthermore, the statistical parameters of all 
distributions are learned using expectation-maximization HD, 
and the number of materials in the scene is estimated using 
penalized log-likelihood maximization. On the whole, the 
proposed HUT-AMP-MOS algorithm performs approximate 
MMSE inference that exploits spectral and spatial coherence, 
in addition to simplex constraints, while avoiding the need for 
the specification of any tuning parameters. 

Through a detailed numerical study, we demonstrated that 
our proposed HUT-AMP algorithm yields accurate recoveries 
of both endmembers and abundances on both synthetic and 
real-world datasets. In particular, we found that HUT-AMP 
gives recoveries that are close to—if not more accurate than— 
state-of-the-art unmixing algorithms like SDSNMF. Mean¬ 
while, the runtime required for HUT-AMP is much less 
than sophisticated spatial-coherence exploiting approaches like 
SDSNMF and SCU—often by several orders of magnitude— 
while within an order of magnitude of the fastest EE-and- 
inversion approach. Our experiments also demonstrated that 
our model-order selection technique was able to correctly 
estimate the number of materials in several synthetic and 
real-world datasets, without requiring a very large increase 
in runtime. 
























MV S A+SUnS AL-TV MVSA+FCLS SCU VCA+SUnSAL-TV FSNMF+SUnSAL-TV VCA+FCLS FSNMF+FCLS SDSNMF HUT-AMP 
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Fig. 16. Examples of recovered abundance maps in the noiseless Cuprite experiment. Each row corresponds to an algorithm and each column corresponds 
to a material. Average runtimes (in seconds) are also listed on the left. 
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Appendix A 
Mean removal 

We can see that S_ from £7} is approximately zero-mean via 

M T 

0 =- V V y (50) 

M-mt v ’ 

m= 1 t= 1 

1 M T N M T 

m—1 t— 1 n= 1 m=l t— 1 

0(1) 0(l/iV) 

TV T M 

~ ant ~M -mn> ( 52 - ) 

n=l £=1 m—1 

Si v ✓ 

4 u a 

Un 

where (f50t follows from the definitions ©-©. The under¬ 
braces in dlD show the scaling on each term in the large- 
system limit (i.e., as N —> oo). These particular scalings 
follow from our assumption that the noise is both zero-mean 
and white and the convention lf30l that both y mt and the noise 
variance t/i scale as 0(1). Recalling that 1 AC = 1 due to 
the simplex constraint, expression <f52~b shows that a weighted 
average of elements in S_ is approximately zero, where the 
approximation becomes exact in the large-system limit. 
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